Towards High Dimensional Data Mining with Boosting of PSVM and Visualization Tools

نویسندگان

  • Thanh-Nghi Do
  • François Poulet
چکیده

We present a new supervised classification algorithm using boosting with support vector machines (SVM) and able to deal with very large data sets. Training a SVM usually needs a quadratic programming, so that the learning task for large data sets requires large memory capacity and a long time. Proximal SVM proposed by Fung and Mangasarian is another SVM formulation very fast to train because it requires only the solution of a linear system. We have used the Sherman-Morrison-Woodbury formula to adapt the PSVM to process data sets with a very large number of attributes. We have extended this idea by applying boosting to PSVM for mining massive data sets with simultaneously very large number of datapoints and attributes. We have evaluated its performance on several large data sets. We also propose a new graphical tool for trying to interpret the results of the new algorithm by displaying the separating frontier between classes of the data set. This can help the user to deeply understand how the new algorithm can work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of the Patient Requirements Using Lean Six Sigma and Data Mining

Lean health care is one of new managing approaches putting the patient at the core of each change. Lean construction is based on visualization for understanding and prioritizing imporvments. By using only visualization techniques, so much important information could be missed. In order to prioritize and select improvements, it’s essential to integrate new analysis tools to achieve a good unders...

متن کامل

Evaluation of Data Mining Algorithms for Detection of Liver Disease

Background and Aim: The liver, as one of the largest internal organs in the body, is responsible for many vital functions including purifying and purifying blood, regulating the body's hormones, preserving glucose, and the body. Therefore, disruptions in the functioning of these problems will sometimes be irreparable. Early prediction of these diseases will help their early and effective treatm...

متن کامل

MineSet: An Integrated System for Data Mining

MineSet TM , Silicon Graphics' interactive system for data mining, integrates three powerful technologies: database access, analytical data mining, and data visualization. It supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. MineSet is based on a client-server architecture that scales to large databases. The dat...

متن کامل

Challenges in High Dimensional Data Visualization

High dimensional real data sets need to be mined for applications such as, social networks, bio-informatics, and for many business critical applications. A major challenge for mining such data is lack of tools to comprehend both the data and patterns mined from the data. Traditionally, data visualization helps in comprehending the data and validating the mined patterns especially for two and th...

متن کامل

Towards a flexible system for exploratory spatio-temporal data mining and visualization

Many natural phenomena present intrinsic spatial and temporal characteristics. With the recent advances in data collection technologies, high resolution spatio-temporal datasets can be stored and analyzed to accurately study the behavior of such events. However, these datasets are often very large and difficult to analyze and display. Recently much attention has been dedicated to the applicatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004